Structured Generative Models of Natural Source Code
نویسندگان
چکیده
We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have three key properties: First, they incorporate both sequential and hierarchical structure. Second, we learn a distributed representation of source code elements. Finally, they integrate closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the models, measured by the probability of generating test programs.
منابع مشابه
Tree-structured Variational Autoencoder
Many kinds of variable-sized data we would like to model contain an internal hierarchical structure in the form of a tree, including source code, formal logical statements, and natural language sentences with parse trees. For such data it is natural to consider a model with matching computational structure. In this work, we introduce a variational autoencoder-based generative model for tree-str...
متن کاملDiscriminative Models for Semi-Supervised Natural Language Learning
An interesting question surrounding semisupervised learning for NLP is: should we use discriminative models or generative models? Despite the fact that generative models have been frequently employed in a semi-supervised setting since the early days of the statistical revolution in NLP, we advocate the use of discriminative models. The ability of discriminative models to handle complex, high-di...
متن کاملSyntactic Topic Models for Language Generation
Since topic models’ inception as probabilistic generative models, it has only been natural to imagine actually applying the generative process to create documents. However, most topic models consist of a generative process that only provides a bag of words which is one critical step short of creating a readable text. With the recent introduction of syntactically sound topic models and structure...
متن کاملStructured classification for multilingual natural language processing
This thesis investigates the application of structured sequence classification models to multilingual natural language processing (NLP). Many tasks tackled by NLP can be framed as classification, where we seek to assign a label to a particular piece of text, be it a word, sentence or document. Yet often the labels which we’d like to assign exhibit complex internal structure, such as labelling a...
متن کاملNatural-Gradient Stochastic Variational Inference for Non-Conjugate Structured Variational Autoencoder
We propose a new variational inference method which uses recognition models for amortized inference in graphical models that contain deep generative models. Unlike many existing approaches, our method can handle non-conjugacy in both the latent graphical model and the deep generative model, and enables fully amortized inference at test time. Our method is based on an extension of a recently pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014